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Abstract. Most earlier studies of DHTs under churn have 
either depended on simulations as the primary investigation 
tool, or on establishing bounds for DHTs to function. In this 
paper, we present a complete analytical study of churn using 
^£jk master-equation-based approach, used traditionally in non- 
(^quilibrium statistical mechanics to describe steady-state or 
C^lransient phenomena. Simulations are used to verify all the- 
oretical predictions. We demonstrate the application of our 
^nethodology to the Chord system. For any rate of churn and 
stabilization rates, and any system size, we accurately predict 
^^ie fraction of failed or incorrect successor and finger point- 
ers and show how we can use these quantities to predict the 
I— performance and consistency of lookups under churn. We also 
^discuss briefly how churn may actually be of different 'types' 
z/and the implications this will have for the functioning of DHTs 
i general. 

^ Introduction 

Q^heoretical studies of asymptotic performance bounds of 
^Q)HTs under churn have been conducted in works like EE), 
^jlowever. within these bounds, performance can vary substan- 
tially as a function of different design decisions and config- 
J^ration parameters. Hence simulation-based studies such as 
|U often provide more realistic insights into the perfor- 
mance of DHTs. Relying on an understanding based on sim- 
ulations alone is however not satisfactory either, since in this 
• 'Tjase, the DHT is treated as a black box and is only empirically 
>Y&valuated, under certain operation conditions. In this paper we 
(present an alternative theoretical approach to analyzing and un- 
derstanding DHTs, which aims for an accurate prediction of 
performance, rather than on placing asymptotic performance 
bounds. Simulations are then used to verify all theoretical pre- 
dictions. 

Our approach is based on constructing and working with 
master equations, a widely used tool wherever the mathemati- 
cal theory of stochastic processes is applied to real-world phe- 
nomena [7]. We demonstrate the applicability of this approach 
to one specific DHT: Chord [9|. For Chord, it is natural to de- 
fine the state of the system as the state of all its nodes, where 
the state of an alive node is specified by the states of all its 
pointers. These pointers (either fingers or successors) are then 
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in one of three states: alive and correct, alive and incorrect or 
failed. A master equation for this system is simply an equa- 
tion for the time evolution of the probability that the system is 
in a particular state. Writing such an equation involves keep- 
ing track of all the gain/loss terms which add/detract from this 
probability, given the details of the dynamics. This approach 
is applicable to any P2P system (or indeed any system with a 
discrete set of states). 

Our main result is that, for every outgoing pointer of a Chord 
node, we systematically compute the probability that it is in 
any one of the three possible states, by computing all the gain 
and loss terms that arise from the details of the Chord proto- 
col under churn. This probability is different for each of the 
successor and finger pointers. We then use this information to 
predict both lookup consistency (number of failed lookups) as 
well as lookup performance (latency) as a function of the pa- 
rameters involved. All our results are verified by simulations. 

The main novelty of our analysis is that it is earned out en- 
tirely from first principles i.e. all quantities are predicted solely 
as a function of the parameters of the problem: the churn rate, 
the stabilization rate and the number of nodes in the system. It 
thus differs from earlier related theoretical studies where quan- 
tities similar to those we predict, were either assumed to be 
given flOl . or measured numerically 

Closest in spirit to our work is the informal derivation in 
the original Chord paper [ 9 1 of the average number of time- 
outs encountered by a lookup. This quantity was approximated 
there by the product of the average number of fingers used in 
a lookup times the probability that a given finger points to a 
departed node. Our methodology not only allows us to de- 
rive the latter quantity rigorously but also demonstrates how 
this probability depends on which finger (or successor) is in- 
volved. Further we are able to derive an exact relation relating 
this probability to lookup performance and consistency accu- 
rately at any value of the system parameters. 

2 Assumptions & Definitions 

Basic Notation. In what follows, we assume that the reader is 
familiar with Chord. However we introduce the notation used 
below. We use K to mean the size of the Chord key space and 
N the number of nodes. Let M. = log 2 K, be the number of fin- 
gers of a node and S the length of the immediate successor list, 
usually set to a value = 0(\og{N)). We refer to nodes by their 



keys, so a node n implies a node with key n € • • • K, — 1. We 
use p to refer to the predecessor, s for referring to the successor 
list as a whole, and Sj for the successor. Data structures of 
different nodes are distinguished by prefixing them with a node 
key e.g. n'.si, etc. Let firii.start denote the start of the i th fin- 
ger (Where for a node n, Vi £ 1..A4, n. firii.start = n + 2 t ~ 1 ) 
and firii.node denote the actual node pointed to by that finger. 

Steady State Assumption. Xj is the rate of joins per node, 
A f the rate of failures per node and A s the rate of stabilizations 
per node. We carry out our analysis for the general case when 
the rate of doing successor stabilizations aX s , is not necessarily 
the same as the rate at which finger stabilizations (1 — a)X s 
are performed. In all that follows, we impose the steady state 
condition Xj = Xf. Further it is useful to define r = which 
is the relevant ratio on which all the quantities we are interested 
in will depend, e.g, r = 50 means that a join/fail event takes 
place every half an hour for a stabilization which takes place 
once every 36 seconds. 

Parameters. The parameters of the problem are hence: JC, 
N, a and r. All relevant measurable quantities should be en- 
tirely expressible in terms of these parameters. 

Chord Simulation. We use our own discrete event simula- 
tion environment implemented in Java which can be retrieved 
from [4]. We assume the familiarity of the reader with Chord, 
however an exact analysis necessitates the provision of a few 
details. Successor stabilizations performed by a node n on n.s\ 
accomplish two main goals: i) Retrieving the predecessor and 
successor list of of n.s\ and reconciling with n's state, ii) 
Informing n.s\ that n is alive/newly joined. A finger stabiliza- 
tion picks one finger at random and looks up its start. Lookups 
do not use the optimization of checking the successor list be- 
fore using the fingers. However, the successor list is used as a 
last resort if fingers could not provide progress. Lookups are 
assumed not to change the state of a node. For joins, a new 
node u finds its successor v through some initial random con- 
tact and performs successor stabilization on that successor. All 
fingers of u that have v as an acceptable finger node are set to v. 
The rest of the fingers are computed as best estimates from v's 
routing table. All failures are ungraceful. We make the simpli- 
fying assumption that communication delays due to a limited 
number of hops is much smaller than the average time interval 
between joins, failures or stabilization events. However, we do 
not expect that the results will change much even if this were 
not satisfied. 

Averaging. Since we are collecting statistics like the proba- 
bility of a particular finger pointer to be wrong, we need to re- 
peat each experiment 100 times before obtaining well-averaged 
results. The total simulation sequential real time for obtaining 
the results of this paper was about 1800 hours that was par- 



allelized on a cluster of 14 nodes where we had N = 1000, 
K. = 2 20 , S = 6, 200 < r < 2000 and 0.25 < a < 0.75. 

3 The Analysis 

3.1 Distribution of Inter-Node Distances 

During churn, the inter-node distance (the difference between 
the keys of two consecutive nodes) is a fluctuating variable. An 
important quantity used throughout the analysis is the pdf of 
inter-node distances. We define this quantity below and state 
a theorem giving its functional form. We then mention three 
properties of this distribution which are needed in the ensuing 
analysis. Due to space limitations, we omit the proof of this 
theorem and the properties here and provide them in 0]. 

Definition 3.1 Let Int{x) be the number of intervals of length 
x, i.e. the number of pairs of consecutive nodes which are sep- 
arated by a distance of x keys on the ring. 

Theorem 3.1 For a process in which nodes join or leave with 
equal rates (and the number of nodes in the network is almost 
constant) independently of each other and uniformly on the 
ring, The probability (P(x) = Int l ■ ) of finding an interval 
of length x is: 

P{x) = p x ~ l {l - p) where p = ^ and 1 - p = % 

The derivation of the distribution P(x) is independent of any 
details of the Chord implementation and depends solely on the 
join and leave process. It is hence applicable to any DHT that 
deploys a ring. 

Property 3.1 For any two keys u and v, where v = u + x, 
let bi be the probability that the first node encountered inbe- 
tween these two keys is at u + i ( where < i < x — 1). Then 
bi = p l (l — p). The probability that there is definitely atleast 
one node between u and v is: a(x) = 1 — p x . Hence the condi- 
tional probability that the first node is at a distance i given that 
there is atleast one node in the interval is bc(i, x) = b(i) /a(x). 

Property 3.2 The probability that a node and atleast one 
of its immediate predecessors share the same k th finger is 
Pl (k) = - p 2k ~ 2 ). This is ~ 1/2 for K » 1 and 

N « K.Clearly p\ = for k = 1. It is straightforward 
(though tedious) to derive similar expressions for p^ik) the 
probability that a node and atleast two of its immediate pre- 
decessors share the same k th finger, ps(k) and so on. 

Property 3.3 We can similarly assess the probability that the 
join protocol (see previous section) results in further replica- 
tion of the k th pointer. That is, the probability that a newly 
joined node will choose the k th entry of its successor's finger 
table as its own k th entry isp- ]0 i n (k) ~ p(l — p 2 ~ 2 ) + (1 — 
p)(l - p 2 "- 2 - 2 ) - (1 - p)p(2 k ~ 2 - 2)p 2fe " 2 - 3 . The function 
Pjoin(k) = Ofor small k and lfor large k. 
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Figure 1: Theory and Simulation for w±(r, a), di(r, a), I(r, a) 
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Figure 2: Changes in W\, the number of wrong (failed or out- 
dated) s\ pointers, due to joins, failures and stabilizations. 

3.2 Successor Pointers 

In order to get a master-equation description which keeps all 
the details of the system and is still tractable, we make the 
ansatz that the state of the system is the product of the states 
of its nodes, which in turn is the product of the states of all 
its pointers. As we will see this ansatz works very well. Now 
we need only consider how many kinds of pointers there are 
in the system and the states these can be in. Consider first the 
successor pointers. 

Let Wk(r,a), dk(r,a) denote the fraction of nodes hav- 
ing a wrong k th successor pointer or a. failed one respectively 
and Wk(r, a), Dk(r,a) be the respective numbers . A faded 
pointer is one which points to a departed node and a wrong 
pointer points either to an incorrect node (alive but not correct) 
or a dead one. As we will see, both these quantities play a role 
in predicting lookup consistency and lookup length. 

By the protocol for stabilizing successors in Chord, a node 



Change in Wi(r, a) 
Wi(t + At) = Wi(t) + l 
Wi(t + At) = Wx(t) + 1 
Wi(t + At) = Wi(t) - 1 
Wi{t + At) = Wi(t) - 1 
Wi(t + At) = Wilt) 



Rate of Change 

ci = (XjAt)(l - wi) 

c 2 = A/(l - wi) 2 At 

c 3 = XfiufAt 

C4 = aX s wiAt 

1 - (ci + c 2 + c 3 + c 4 ) 



Table 1: Gain and loss terms for Wi(r,a): the number of 
wrong first successors as a function of r and a. 



periodically contacts its first successor, possibly correcting it 
and reconciling with its successor list. Therefore, the number 
of wrong k th successor pointers are not independent quantities 
but depend on the number of wrong first successor pointers. 
We consider only s\ here. 

We write an equation for Wi(r, a) by accounting for all the 
events that can change it in a micro event of time At. An illus- 
tration of the different cases in which changes in W\ take place 
due to joins, failures and stabilizations is provided in figure |2] 
In some cases W\ increases/decreases while in others it stays 
unchanged. For each increase/decrease, table Q provides the 
corresponding probability. 

By our implementation of the join protocol, a new node n y , 
joining between two nodes n x and n z , has its si pointer always 
correct after the join. However the state of n x .s\ before the join 
makes a difference. If n x .si was correct (pointing to n z ) before 
the join, then after the join it will be wrong and therefore W\ 
increases by 1. If n x .s\ was wrong before the join, then it will 
remain wrong after the join and W\ is unaffected. Thus, we 
need to account for the former case only. The probability that 
n x .si is correct is 1 — wi and from that follows the term c\. 

For failures, we have 4 cases. To illustrate them we use 
nodes n x , n y , n z and assume that n y is going to fail. First, 



if both n x .s\ and n y .s\ were correct, then the failure of n y 
will make n x .s\ wrong and hence W\ increases by 1. Sec- 
ond, if n x .s\ and n y .s\ were both wrong, then the failure of n y 
will decrease W\ by one, since one wrong pointer disappears. 
Third, if n x .s\ was wrong and n y .s\ was correct, then W\ is 
unaffected. Fourth, if n x .s\ was correct and n y .si was wrong, 
then the wrong pointer of n y disappeared and n x .s\ became 
wrong, therefore W\ is unaffected. For the first case to happen, 
we need to pick two nodes with correct pointers, the probabil- 
ity of this is (1 — wi) 2 . For the second case to happen, we need 
to pick two nodes with wrong pointers, the probability of this 
is w 2 . From these probabilities follow the terms c 2 and C3. 

Finally, a successor stabilization does not affect W\, unless 
the stabilizing node had a wrong pointer. The probability of 
picking such a node is w\. From this follows the term C4. 

Hence the equation for Wi(r, a) is: 



At time t + At 



dWi 
dt 



Xj(l — w\) + A/(l — wi) 2 — \fw\ — a\ s w\ 



Solving for w\ in the steady state and putting Xj = Xf, we get: 



wi(r,a) 



TO: 



(1) 



3 + ra 

This expression matches well with the simulation results as 
shown in figure^ d\(r,a) is then « |^i(r, a) since when 
Xj = Xf, about half the number of wrong pointers are incorrect 
and about half point to dead nodes. Thus d\ (r, a) r* — which 
also matches well the simulations as shown in figure^ We can 
also use the above reasoning to iteratively get Wk(r, a) for any 
k. 

Lookup Consistency By the lookup protocol, a lookup is 
inconsistent if the immediate predecessor of the sought key 
has an wrong si pointer. However, we need only consider the 
case when the s\ pointer is pointing to an alive (but incorrect) 
node since our implementation of the protocol always requires 
the lookup to return an alive node as an answer to the query. 
The probability that a lookup is inconsistent I(r, a) is hence 
wi(r,a) — di(r,a>). This prediction matches the simulation 
results very well, as shown in figure Q 

3.3 Failure of Fingers 

We now turn to estimating the fraction of finger pointers which 
point to failed nodes. As we will see this is an important quan- 
tity for predicting lookups. Unlike members of the successor 
list, alive fingers even if outdated, always bring a query closer 
to the destination and do not affect consistency. Therefore we 
consider fingers in only two states, alive or dead (failed). 

Let f k (r, a) denote the fraction of nodes having their k th fin- 
ger pointing to a failed node and Fj-(r, a) denote the respective 
number. For notational simplicity, we write these as simply F k 
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Figure 4: Changes in the number of failed /m& pointers, 
due to joins, failures and stabilizations. 
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Table 2: Some of the relevant gain and loss terms for F k , the 
number of nodes whose kth fingers are pointing to a failed 
node for k > 1. 



and f k . We can predict this function for any k by again esti- 
mating the gain and loss terms for this quantity, caused by a 
join, failure or stabilization event, and keeping only the most 
relevant terms. These are listed in table fS] 

A join event can play a role here by increasing the number 
of F k pointers if the successor of the joinee had a failed k th 
pointer (occurs with probability f k ) and the joinee replicated 
this from the successor (occurs with probability p- ]0 i n {k) from 
property I3.3I) . 

A stabilization evicts a failed pointer if there was one to be- 
gin with. The stabilization rate is divided by AL since a node 
stabilizes any one finger randomly, every time it decides to sta- 
bilize a finger at rate (1 — a)X s . 

Given a node n with an alive k th finger (occurs with prob- 
ability 1 — /fc), when the node pointed to by that finger fails, 
the number of failed k th fingers (F k ) increases. The amount 
of this increase depends on the number of immediate predeces- 
sors of n that were pointing to the failed node with their k th 
finger. That number of predecessors could be 0, 1, 2,.. etc. Us- 
ing property 13.21 the respective probabilities of those cases are: 
1 - pi{k), pi(fc) - p 2 (k), p 2 (k) - p 3 (k),... etc. 
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Figure 3: Theory and Simulation for f\{r, a), and L(r, a) 



Solving for fk in the steady state, we get: 



fk 



2(1 + Prep(k)) 



2Prep{k) 4" 2 Pjoin{k) H ^ Jvl ' 



4(1 + P rep (k)) 



2(1+P rep (k)) 



(2) 



where P rep (k) = T,pi(k). In principle its enough to keep 
even three terms in the sum. The above expressions match very 
well with the simulation results (figure^). 

3.4 Cost of Finger Stabilizations and Lookups 

In this section, we demonstrate how the information about the 
failed fingers and successors can be used to predict the cost 
of stabilizations, lookups or in general the cost for reaching 
any key in the id space. By cost we mean the number of hops 
needed to reach the destination including the number of time- 
outs encountered en-route. For this analysis, we consider time- 
outs and hops to add equally to the cost. We can easily gener- 
alize this analysis to investigate the case when a timeout costs 
some factor n times the cost of a hop. 

Define Ct(r, a) (also denoted Ct) to be the expected cost for 
a given node to reach some target key which is t keys away 
from it (which means reaching the first successor of this key). 
For example, C\ would then be the cost of looking up the adja- 
cent key (1 key away). Since the adjacent key is always stored 
at the first alive successor, therefore if the first successor is alive 
(occurs with probability 1 — d\), the cost will be 1 hop. If the 
first successor is dead but the second is alive (occurs with prob- 
ability d\{\ — cfo)), the cost will be 1 hop + 1 timeout = 2 and 
the expected cost is 2 X di(l — d?) and so forth. Therefore, we 



haveCi = 1 - d 1 + 2 x di(l - d 2 ) + 3 x did 2 (l - d 3 ) H « 

1 + dt =1 + 1/(ot). 

For finding the expected cost of reaching a general distance 
t we need to follow closely the Chord protocol, which would 
lookup t by first finding the closest preceding finger. For no- 
tational simplicity, let us define £ to be the start of the finger 
(say the k th ) that most closely precedes t. Thus t = £ + m, 
i.e. there are m keys between the sought target t and the start 
of the most closely preceding finger. With that, we can write a 
recursion relation for C^ +m as follows: 



C 5+m = C 5 [1 - a(m)] 
+ (!-/* 



am 



+ fka(m) 



) + b m+ i-iCi 

i=l 

fc-1 
1 + X>(0 



(3) 



J2 HI, £/2<)(l + C^+l-i+m) + 2h k (k) 



1=1 



where £j = X^m=i i£/2 m an d hk{i) is the probability that 
a node is forced to use its k — i th finger owing to the death 
of its k th finger. The probabilities a, b, be have already been 
introduced in section 3. 

The lookup equation though rather complicated at first sight 
merely accounts for all the possibilities that a Chord lookup 
will encounter, and deals with them exactly as the protocol dic- 
tates. The first term accounts for the eventuality that there is no 
node intervening between £ and £ + m (occurs with probabil- 
ity 1 — a(m)). In this case, the cost of looking for £ + m is 
the same as the cost for looking for £. The second term ac- 
counts for the situation when a node does intervene inbetween 



(with probability a(m)), and this node is alive (with probability 
1 — fk)- Then the query is passed on to this node (with 1 added 
to register the increase in the number of hops) and then the cost 
depends on the length of the distance between this node and t. 
The third term accounts for the case when the intervening node 
is dead (with probability Then the cost increases by 1 (for 
a timeout) and the query needs to be passed back to the closest 
preceding finger. We hence compute the probability h)-{i) that 
it is passed back to the k — i th finger either because the inter- 
vening fingers are dead or share the same finger table entry as 
the k th finger. The cost of the lookup now depends on the re- 
maining distance to the sought key. The expression for h^i) is 
easy to compute using theorem 3.1 and the expression for the 
/fc's 11- 

The cost for general lookups is hence 



The lookup equation is solved recursively, given the coeffi- 
cients and C\. We plot the result in Fig|5| The theoretical result 
matches the simulation very well. 

4 Discussion and Conclusion 

We now discuss a broader issue, connected with churn, which 
arises naturally in the context of our analysis. As we mentioned 
earlier, all our analysis is performed in the steady state where 
the rate of joins is the same as the rate of departures. However 
this rate itself can be chosen in different ways. While we ex- 
pect the mean behaviour to be the same in all these cases, the 
fluctuations are very different with consequent implications for 
the functioning of DHTs. The case where fluctuations play the 
least role are when the join rate is "per-network" (The number 
of joinees does not depend on the current number of nodes in 
the network) and the failure rate is "per-node" (the number of 
failures does depend on the current number of occupied nodes). 
In this case, the steady state condition is Xj/N = Xf guaran- 
teeing that ./V can not deviate too much from the steady state 
value. In the two other cases where the join and failure rate 
are both per-network or (as in the case considered in this pa- 
per) both per-node, there is no such "repair" mechanism, and 
a large fluctuation can (and will) drive the number of nodes 
to extinction, causing the DHT to die. In the former case, the 
time-to-die scales with the number of nodes as ~ N 3 while in 
the latter case it scales as ~ N 2 E). Which of these 'types' of 
churn is the most relevant? We imagine that this depends on 
the application and it is hence probably of importance to study 
all of them in detail. 

To summarize, in this paper, we have presented a detailed 
theoretical analysis of a DHT-based P2P system, Chord, us- 
ing a Master-equation formalism. This analysis differs from 



existing theoretical work done on DHTs in that it aims not at 
establishing bounds, but on precise determination of the rele- 
vant quantities in this dynamically evolving system. From the 
match of our theory and the simulations, it can be seen that we 
can predict with an accuracy of greater than 1% in most cases. 

Apart from the usefulness of this approach for its own sake, 
we can also gain some new insights into the system from it. 
For example, we see that the fraction of dead finger pointers 
fk is an increasing function of the length of the finger. Infact 
for large enough /C, all the long fingers will be dead most of 
the time, making routing very inefficient. This implies that we 
need to consider a different stabilization scheme for the fingers 
(such as, perhaps, stabilizing the longer fingers more often than 
the smaller ones), in order that the DHT continues to function 
at high churn rates. We also expect that we can use this analysis 
to understand and analyze other DHTs. 
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